Adapting the OCMiner text processing system to the CTD controlled vocabulary

نویسندگان

  • Matthias Irmer
  • Claudia Bobach
  • Timo Böhme
  • Ulf Laube
  • Anett Püschel
  • Lutz Weber
چکیده

We adapted OCMiner, a modular text processing pipeline especially suited for high-speed processing of large document collections, to a specific controlled vocabulary as given by the Comparative Toxicogenomic Database (CTD). We provide a RESTful web service which processes documents given in the BioCreative XML format and annotates them with domainspecific terms from the CTD domains genes, chemistry, diseases and action terms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ii Track 3 An Overview of the BioCreative Workshop 2012 Track III : Interactive

The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The BioCreative Workshop 2012 subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought communi...

متن کامل

OCMiner: Text Processing, Annotation and Relation Extraction for the Life Sciences

We present OCMiner, a high-performance text processing system for large document collections of scientific publications. Several linguistic options allow adjusting the quality of annotation results which can be specialized and fine-tuned for the recognition of Life Science terms. Recognized terms are mapped to semantic concepts which are ontologically located within their respective domain taxo...

متن کامل

The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database

The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel...

متن کامل

Vodcast: A Breakthrough in Developing Incidental Vocabulary Learning

Incidental vocabulary learning is often seen as superior to direct instruction on many occasions. Meanwhile, upon the emergence of the World Wide Web, second language (SL) learners have been introduced to 'podcasts' (recorded audio and video online broadcasts) which could be authentic sources of vocabulary learning. The relatively recent phenomenon of video podcast (vodcast) might be considered...

متن کامل

MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database

The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators manually curate a triad of chemical-gene, chemical-disease and gene-disease relationships from the scientific literature. The CTD curation paradigm uses controlled vocabularies for chemicals, genes and diseases. To curate di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013